Insights from a workplace SARS-CoV-2 specimen collection program, with genomes placed into global sequence phylogeny

In 2020, the Department of Energy established the National Virtual Biotechnology Laboratory (NVBL) to address key challenges associated with COVID-19. As part of that effort, Pacific Northwest National Laboratory (PNNL) established a capability to collect and analyze specimens from employees who self-reported symptoms consistent with the disease. During the spring and fall of 2021, 688 specimens were screened for SARS-CoV-2, with 64 (9.3%) testing positive using reverse-transcriptase quantitative PCR (RT-qPCR). Of these, 36 samples were released for research. All 36 positive samples released for research were sequenced and genotyped. Here, the relationship between patient age and viral load as measured by Ct values was measured and determined to be only weakly significant. Consensus sequences for each sample were placed into a global phylogeny and transmission dynamics were investigated, revealing that the closest relative for many samples was from outside of Washington state, indicating mixing of viral pools within geographic regions.


Introduction
SARS-COV-2 coronavirus has caused nearly 80 million cases of COVID-19 in the United States since the onset of the global pandemic in late 2019/early 2020, and approximately 975,000 deaths [1]. COVID-19 can lead to severe negative health outcomes in recovered patients, including neurological [2,3] and cardiovascular [4,5] effects, with an estimate that as many as 80% of recovered COVID-19 patients experience some form of long-term health consequence including fatigue, headaches, attention disorder, hair loss, and dyspnea (difficulty breathing) [6]. Among those infected by COVID-19, advanced age is associated with increased disease severity and negative outcomes [7,8].
In 2020, the US Department of Energy established the National Virtual Biotechnology Laboratory (NVBL) in March 2020 to address key challenges associated with the COVID-19 crisis [9]. NVBL brought together the broad scientific and technical expertise and resources of DOE's 17 national laboratories to help tackle medical supply shortages, discover potential drugs to fight the virus, develop and validate COVID-19 testing methods, model disease spread and impact across the nation, and understand virus transport in buildings and the environment. As part of that effort, Pacific Northwest National Laboratory, located in Richland WA, established a capability to conduct on-campus collection and analysis of nasopharyngeal specimens from employees self-reporting symptoms of COVID-19. In Washington state, USA, approximately 1.45 million cases and 12,000 deaths have been reported as of March 2022 [10]. Between January and October of 2021, 36 specimens collected by PNNL were identified as SARS-COV-2 positive by two quantitative PCR assays (N1 and N2) and submitted to the University of Washington Medicine's Virology Laboratory (Seattle WA) for genotyping and sequencing. An additional assay for human RNAse P (Rp) was used as an internal control. Our initial goal was to investigate the relationship between patient age and viral load (as measured by RT-qPCR cycle threshold (Ct) values), as suggested in [11] and/or SARS-COV-2 lineage. Here, we report the results of these efforts and place the samples in a global phylogeny.

Materials and methods
This work was approved by the PNNL institutional review board (IRB number 2020-12 / PNNL000279) and informed consent given by all patients. No minors were involved in this study. Sampling was conducted at the on-site occupational health clinic at PNNL. A form containing information about the study and requesting written consent for research use was provided to patients when they registered for voluntary COVID testing. Samples were collected and processed according to CDC guidelines [CDC 2019-Novel Coronavirus (20190CoV) Real-Time RT-PCR Diagnostic Panel; [12]]. Steps are briefly outlined below.

Sample collection and preparation
Nasopharyngeal swabs were collected at the PNNL onsite clinic and stored at 2-8˚C for up to 36 hours after receipt by the laboratory. Any samples requiring additional time for processing were stored at -70˚C for no more than 72 hours. Samples were processed inside of a class II biosafety cabinet at Biosafety Level 2 (BSL-2) controls. Viral nucleic acids were extracted using a Qiagen QIAamp Viral RNA Mini Kit according to manufacturer's instructions. Of 688 samples collected between January and October 2021, 64 were positive for SARS-CoV-2 and 36 samples were released for research. All 36 positive samples released for research were sequenced and genotyped.

Quantitative PCR and statistical analysis
Reverse transcriptase-quantitative qPCR assays were prepared inside of a PCR workstation according to manufacturers' instructions. Sample nucleic acids, including positive and negative controls, were added to TaqPath 1-Step RT-qPCR Master Mix (ThermoFisher) and 2019-nCoV CDC Probe and Primer Kit for SARS-COV-2 (Biosearch Technologies) in 96-well plates and sealed with MicroAmp optical adhesive film (Applied Biosystems). RT-qPCR was carried out on an Applied Biosystems 7500 Fast Dx Real-Time PCR instrument using the following cycles: UNG incubation 25˚C for 2 minutes, RT incubation 50˚C for 15 minutes, enzyme activation 95˚C for 2 minutes, 45 amplification cycles 95˚C for 3 seconds/55˚C for 30 seconds.
All statistical analyses of Ct values (i.e., Pearson correlations and Wilcoxon ranked-sum tests) were performed using R v4.2.0 and packages ggsignif v0.6.3 and ggpubr v0.4.0. Tests of normality for age cohorts was performed using the Shapiro-Wilk test function of base R.

Library preparation and sequencing
Viral nucleic acids were submitted to the University of Washington Medicine's Virology Laboratory for sequencing. Libraries were prepared using Swift BioSciences SARS-CoV-2 amplicon panel according to manufacturer's instructions, resulting in 345 amplicons. Libraries were sequenced using an Illumina NextSeq500 in paired-end mode.

Alignment and local phylogenetic tree generation
Consensus sequences were aligned in Molecular Evolutionary Genetics Analysis (MEGA) v10.2.6 [19,20] using the MUSCLE algorithm. Maximum-likelihood tree of samples was generated in MEGA using the Tamura-Nei model of substitutions and assuming uniform rates, with 500 bootstraps and Wuhan-Hu-1 as root.

Statistical analysis
36 samples collected between January and October 2021 from patients aged 19-71 were determined to be positive by qPCR assays targeting viral N1 and N2 genes, with an additional Rp internal control (Table 1, Fig 1). Using these data, Pearson correlation between patient age and cycle threshold (Ct; Fig 1) values for all three assays was investigated using linear regression. No significant correlation was found between age and Ct value (inset, Fig 1).
To assess the relationship more finely between age and Ct values, qPCR results were binned for each assay into decades by birth years Together, these data suggest that while age is associated with more severe COVID-19 symptoms and morbidity [7,8], it does not necessarily correlate with viral load in all patients.

PLOS ONE
Analysis of SARS-COV-2 workplace specimens

Analysis of SARS-COV-2 workplace specimens
We also investigated whether there existed a relationship between patient gender and mean Ct values for each assay. No significant differences as determined by Wilcoxon ranked-sum testing between Ct values of male and female patients existed for any assay (Fig 3).
These results are in contrast to Levine-Tiefenbrun et al [11], who reported differences in cycle threshold for N gene detection assays between patients age 40 and above and those younger than 40 during the first four days post-diagnosis, with older patients exhibiting lower Ct values, especially in men. To investigate the possibility that a similar pattern might exist in our data, samples were sorted into the same age categories as Levine-Tiefenbrun and differences in mean Ct values were determined using Wilcoxon ranked-sum test. Interestingly, although mean Ct values were higher in the older cohort for all three assays (the opposite pattern observed by Levine-Tiefenbrun), these differences were only significant for the Rp assay (Fig 4). Removal of two samples with outlier Ct values for N1 and N2 assays, V341233359 and V341233381, did not result in statistically significant differences in these assays between age groups (p = 0.099 and p = 0.109, respectively), but did result in the difference in Ct values for Rp assay between age groups losing statistical significance (p = 0.067). It is not clear why the pattern of these values differs from previously published results. One possibility is a difference in variants detected within the two groups, but examination of variants within the two cohorts does not indicate this to be the case. Another potential explanation is that Levine-Tiefenbrun performed testing after a formal diagnosis of SARS-CoV-2 infection, while our study was based on patients volunteering to be tested without a formal diagnosis-which could have resulted in testing during different phases of infection. It is also not clear why the mean difference in Ct values for Rp assay is significant. Gene expression in general is known to change as a function of age in humans [e.g., 27]; we are however not aware of any studies showing agedependent changes in expression of RNAse P itself.
To investigate the possibility of gender-specific differences within the same cohorts we further parsed Ct data for each qPCR assay into categories by male and female for a total of four cohorts: female 40 years of age or older, female younger than 40, male 40 or older, and male younger than 40. Comparisons were made in a pairwise manner between all four cohorts. As before, although the mean Ct values for each assay were higher for the older cohort in males and females, these differences were not significant as measured by Wilcoxon ranked-sum testing. Additionally, no significantly different mean Ct values were identified between any of the six possible gender/age pairs using Wilcoxon ranked-sum testing (Fig 5). Removal of the same

PLOS ONE
Analysis of SARS-COV-2 workplace specimens two outlier samples as above resulted in significant differences in N1 Ct values for females in different age groups (p = 0.025). Additionally, differences in N2 Ct values were significant for females in different age groups (p = 0.009) and for females younger than 40 compared to males 40 and over (p = 0.029). In our hands, therefore, the relationship between age or age/gender

PLOS ONE
Analysis of SARS-COV-2 workplace specimens on Ct values for these assays is not strong, and it remains unclear why our results differ from those of earlier researchers.

Analysis of SARS-COV-2 lineages
Lineages were determined by UW Medicine's Virology Laboratory, based on mutations in spike protein (Table 1). One sample, 517711, failed sequencing standards at UW and was therefore not sequenced or genotyped. Identified genotypes in this dataset correspond with circulating genotypes at the time of sampling. During early 2021, a relatively small number of samples were identified as CDC variants being monitored: two epsilon (B.1.2, V341233246; B.1.429, V341233343) and three alpha variants (B.1.1.7, V349346303; B.1.1.7, V349346041; Q.1, V349346159). However, by late 2021 two samples were identified as being a variant being monitored (V349346383 and V349346212, B.1.621/Mu) and all others were identified as a variant of concern, delta. Overall, these results indicate general concordance between this dataset and overall historical data concerning variant prevalence.
To better understand the evolutionary relationship between SARS-COV-2 samples in this dataset, a maximum-likelihood phylogeny was constructed from consensus sequences using Wuhan-Hu-1 as a root (Fig 6). Bootstrap values and low map distance in this tree indicate strong relatedness among the samples, as may be anticipated from data collected in a single geographic area. As expected, based on knowledge of variants in each sample, there was a clear break point between samples collected between January-March and August-October 2021 with three alpha variant samples emerging during April 2021. Two samples, V349346128 and V349346451 are of the same lineage (AY.25.1) and appear by this analysis to be virtually identical, raising the possibility of transmission between close contacts; during this time period, one work-based case of transmission was documented. An additional sample, V349346065 has a strong bootstrap value placing it outside the main delta variant clade, suggesting the possibility that this infection took place outside of the immediate geographic area.
Finally, consensus sequences were placed into an existing global phylogeny comprising 8,790,585 worldwide genomes using UShER [21] (Fig 7). In this analysis, samples fall into the expected clades by variant (e.g., delta clade, alpha clade, etc.). Interestingly, samples from this effort in many cases did not cluster together but were in fact placed into divergent lineages within clades. To investigate this phenomenon more fully, closest genetic relatives were identified (Table 2). In many cases the closest relative to a given sample as determined by UShER were not from Washington state, indicating infection in other regions within the United States with subsequent testing in WA, or possibly infection within WA from sources traveling from elsewhere. Geographic separation during infection would also explain why the samples shown in Fig 7 do not cluster together. Taken together, the results presented here provide a snapshot into regional SARS-COV-2 transmission within Washington state, USA during the spring and fall of 2021. While there was no overall correlation between patient age and viral load as measured by qPCR Ct values, certain cohorts of patient samples did exhibit subtly different Ct values for N1 and N2 assays. However, the significance of these differences is generally low and similar to differences in Rp control gene expression between the same cohorts. Additionally, no significant relationship was found between patient age and sampling date or variant identified, between sampling date and Ct values, or between Ct values and patient gender. Additional insight into transmission dynamics during this period was obtained by placing each sample into a global phylogenetic context, which indicated significant geographic variability. Consistent with observed geographic variability, there was only a single confirmed case of workplace transmission in the samples examined, and together these highlight the risks posed by travel during COVID-19 surges.